90 research outputs found

    Budgeted Knowledge Transfer for State-wise Heterogeneous RL Agents

    Get PDF
    In this paper we introduce a budgeted knowledge transfer algorithm for non-homogeneous reinforcement learning agents. Here the source and the target agents are completely identical except in their state representations. The algorithm uses functional space (Q-value space) as the transfer-learning media. In this method, the target agent’s functional points (Q-values) are estimated in an automatically selected lower-dimension subspace in order to accelerate knowledge transfer. The target agent searches that subspace using an exploration policy and selects actions accordingly during the period of its knowledge transfer in order to facilitate gaining an appropriate estimate of its Q-table. We show both analytically and empirically that this method decreases the required learning budget for the target agent

    Uncovering hidden network architecture from spiking activities using an exact statistical input-output relation of neurons

    Get PDF
    神経回路網の構造をつきとめる --神経活動と回路構造をつなぐ新しい地図を作成--. 京都大学プレスリリース. 2023-02-16.Charting a course in the brainy frontier. 京都大学プレスリリース. 2023-02-17.Identifying network architecture from observed neural activities is crucial in neuroscience studies. A key requirement is knowledge of the statistical input-output relation of single neurons in vivo. By utilizing an exact analytical solution of the spike-timing for leaky integrate-and-fire neurons under noisy inputs balanced near the threshold, we construct a framework that links synaptic type, strength, and spiking nonlinearity with the statistics of neuronal population activity. The framework explains structured pairwise and higher-order interactions of neurons receiving common inputs under different architectures. We compared the theoretical predictions with the activity of monkey and mouse V1 neurons and found that excitatory inputs given to pairs explained the observed sparse activity characterized by strong negative triple-wise interactions, thereby ruling out the alternative explanation by shared inhibition. Moreover, we showed that the strong interactions are a signature of excitatory rather than inhibitory inputs whenever the spontaneous rate is low. We present a guide map of neural interactions that help researchers to specify the hidden neuronal motifs underlying observed interactions found in empirical data

    Context Transfer in Reinforcement Learning Using Action-Value Functions

    Get PDF
    This paper discusses the notion of context transfer in reinforcement learning tasks. Context transfer, as defined in this paper, implies knowledge transfer between source and target tasks that share the same environment dynamics and reward function but have different states or action spaces. In other words, the agents learn the same task while using different sensors and actuators. This requires the existence of an underlying common Markov decision process (MDP) to which all the agents’ MDPs can be mapped. This is formulated in terms of the notion of MDP homomorphism. The learning framework is Q-learning. To transfer the knowledge between these tasks, the feature space is used as a translator and is expressed as a partial mapping between the state-action spaces of different tasks. The Q-values learned during the learning process of the source tasks are mapped to the sets of Q-values for the target task. These transferred Q-values are merged together and used to initialize the learning process of the target task. An interval-based approach is used to represent and merge the knowledge of the source tasks. Empirical results show that the transferred initialization can be beneficial to the learning process of the target task

    Energy Efficient Locomotion with Adaptive Natural Oscillator

    Get PDF
    For robotic systems, energy efficiency is one the most crucial goals. Many studies have been done to accomplish this goal from design and control point of view. In the second view, one of the preferred method is to design the desired trajectory in harmony with the dynamics of the system; i.e. natural dynamics exploitation. Assuming a structure for the desired trajectory, such as sinusoidal trajectories, we can have a parametrized control system as in CPG-Network. Therefore, having an adaptation method for those parameters to reach energy efficiency can be beneficial to control of robotic systems
    corecore